Search CORE

7 research outputs found

Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems

Author: DJ Sorin
JM Mellor-Crummey
M Martin
MMK Martin
P Magnusson
T Harris
Y Afek
Publication venue: Springer International Publishing
Publication date: 24/08/2016
Field of study

DOI 10.1007/978-3-319-43659-3Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation due to high contention scenarios and different hardware and operating system limitations that abort transactions, e.g. cache overflows, hardware and software exceptions, etc. To deal with these events and to ensure forward progress, BE-HTM systems usually provide a software fallback path to execute a lock-based version of the code. In this paper, we propose a hardware implementation of an irrevocability mechanism as an alternative to the software fallback path to gain insight into the hardware improvements that could enhance the execution of such a fallback. Our mechanism anticipates the abort that causes the transaction serialization, and stalls other transactions in the system so that transactional work loss is mini- mized. In addition, we evaluate the main software fallback path approaches and propose the use of ticket locks that hold precise information of the number of transactions waiting to enter the fallback. Thus, the separation of transactional and fallback execution can be achieved in a precise manner. The evaluation is carried out using the Simics/GEMS simulator and the complete range of STAMP transactional suite benchmarks. We obtain significant performance benefits of around twice the speedup and an abort reduction of 50% over the software fallback path for a number of benchmarks.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

LNCS

Author: A Sánchez
CS Ellis
D Dice
G Evangelidis
I Jaluta
JM Mellor-Crummey
M Desnoyers
M Herlihy
O Rodeh
R Bayer
V Srinivasan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Concurrent accesses to shared data structures must be synchronized to avoid data races. Coarse-grained synchronization, which locks the entire data structure, is easy to implement but does not scale. Fine-grained synchronization can scale well, but can be hard to reason about. Hand-over-hand locking, in which operations are pipelined as they traverse the data structure, combines fine-grained synchronization with ease of use. However, the traditional implementation suffers from inherent overheads. This paper introduces snapshot-based synchronization (SBS), a novel hand-over-hand locking mechanism. SBS decouples the synchronization state from the data, significantly improving cache utilization. Further, it relies on guarantees provided by pipelining to minimize synchronization that requires cross-thread communication. Snapshot-based synchronization thus scales much better than traditional hand-over-hand locking, while maintaining the same ease of use

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Active memory controller

Author: A Ailamaki
A Gottlieb
A Saulsbury
Ali Ibrahim
C Batten
C Cascaval
D Kim
D Patterson
DH Albonesi
DJ Sorin
DJ Sorin
DS Nikolopoulos
F Petrini
G Blelloch
G Marin
I Zotov
J Kuskin
J Laudon
J Torrellas
J Torrellas
JB Brockman
JH Ahn
JM Mellor-Crummey
John B. Carter
K Keeton
KM Chandy
L Zhang
L Zhang
L Zhao
LA Barroso
Lixin Zhang
M Garzaran
M Hall
M Hao
M Oskin
Michael A. Parker
P Kogge
PA Boncz
R Kalla
RE Kessler
S Chatterjee
S Kumar
S Scott
Sally A. McKee
T Anderson
T Eicken von
V Tipparaju
Xiaowei Jiang
Y Solihin
Y Solihin
Z Fang
Zhen Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs\u27 performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50x faster barriers, 12x faster spinlocks, 8.5x-15x faster stream/array operations, and 3x faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation

Crossref

Chalmers Research

Effcient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers

Author: Agustín de Dios
Benjamín Sahelices
D Lenoski
DD James
G Graunke
JM Mellor-Crummey
José María Llabería
M Chaudhuri
M Heinrich
Pablo Ibáñez
TE Anderson
Víctor Viñals-Yúfera
W Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref